Using Hardware Event Counters for Continuous, Online System Optimization: Lessons and Challenges

نویسندگان

  • Christos D. Antonopoulos
  • Dimitrios S. Nikolopoulos
چکیده

Most modern processors offer hardware support for monitoring performance events related to the interaction of applications with specific subunits of the processor [4, 7, 8, 9, 10]. The insight attained from performance monitoring counters is useful for both application programmers and processor manufacturers. Programmers typically employ them as a powerful tool for post-mortem analysis, identification and resolution of performance bottlenecks in their applications. Processor manufacturers, on the other hand, can collect valuable information on the performance of their products while the latter are used in production environments. This knowledge is then exploited during the design phase of future products. Our project, MOHCA (MOnitoring of Hardware for Continuous Adaptation) exploits performance monitoring counters in a different way. The counters are used for online monitoring of hardware events. The information collected is fed back to OS scheduling policies, providing them with awareness of the dynamically changing characteristics of the execution environment and allowing them to continuously adapt to these characteristics and reach more educated scheduling decisions. The scheduling policies have been implemented in the context of a processor manager, i.e. a server process which applies kernel-level scheduling decisions from user-level. Although this approach introduces practically negligible overhead, since counters do not need to be sampled frequently, it has been totaly neglected by current OS schedulers. We have already implemented a successful prototype in Linux, and used it to efficiently schedule workloads on SMPs consisting of multiple Intel HyperThreaded (HT) processors. The same prototype can be used on generic multi-SMT or future multi-CMP systems. The performance gains due to the use of feedback-driven scheduling policies are significant, even in current architectures. Moreover, the performance impact is projected to be huge in future multicore architectures, where a full-blown multiprogrammed/multiprocessor operating system with distributed functionality will be contained in a single chip. More information and results from this project are presented in recent publications [1, 2, 6]. The current implementation of HyperThreaded (HT) processors from Intel shares performance monitoring hardware among both execution contexts available on the same physical processor. As a result, conflicts may arise if both threads executing on the same processor attempt to use performance monitoring. The strategy typically adopted by system software to deal with the problem is to disallow the execution of two threads using performance counters on the same physical processor. This restriction has adverse effects on the usability of performance monitoring facilities on HT processors. In particular, disabling threads makes online performance monitoring for continuous adaptation infeasible. We have managed to overcome this limitation by using 2 sets of performance monitoring registers for each event

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimization and Paralllelization Experiences Using Hardware Performance Counters

Current hardware for compute intensive tasks includes a large amount of processing facilities which is sometimes hard to use in an optimized way. High performance computing (HPC) is always focused in solving grand challenge (or, at least, compute intensive) problems for which the response time is the priority. We have been working from two different but usually complementary research problems: ...

متن کامل

Hotspot Detection of SPEC CPU 2006 Benchmarks with Performance Event Counters⋆

Abstract. Hotspot is the part of a program where most execution time is spent. Detecting the hotspot enables the optimization of the program. The performance event counters embedded in modern processors provide the hardware support for the hotspot detection. By sampling the instruction addresses of the running program with performance event counters, hotspot of the program can be statistically ...

متن کامل

Hybrid Probabilistic Search Methods for Simulation Optimization

Discrete-event simulation based optimization is the process of finding the optimum design of a stochastic system when the performance measure(s) could only be estimated via simulation. Randomness in simulation outputs often challenges the correct selection of the optimum. We propose an algorithm that merges Ranking and Selection procedures with a large class of random search methods for continu...

متن کامل

Obtaining Hardware Performance Metrics for the BlueGene/L Supercomputer

Hardware performance monitoring is the basis of modern performance analysis tools for application optimization. We are interested in providing such performance analysis tools for the new BlueGene/L supercomputer as early as possible, so that applications can be tuned for that machine. We are faced with two challenges in achieving that goal. First, the machine is still going through its final de...

متن کامل

Optimization and Parallelization Experiences Using Hardware Performance Counters

Current hardware for compute intensive tasks includes a large amount of processing faan optimized way. High performance computing (HPC) is always focused in solving challenging (or, at least, compute intensive) problems for which the response time is the priority. We have been working from two different but usually complementary research problems: a) updating and parallelizing legacy (HPC/numer...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005